Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
نویسندگان
چکیده
The rise of deep learning in recent years has brought with it increasingly clever optimization methods to deal with complex, non-linear loss functions [13]. These methods are often designed with convex optimization in mind, but have been shown to work well in practice even for the highly non-convex optimization associated with neural networks. However, one significant drawback of these methods when they are applied to deep learning is that the magnitude of the update step is sometimes disproportionate to the magnitude of the weights (much smaller or larger), leading to training instabilities such as vanishing and exploding gradients [12]. An idea to combat this issue is gradient descent with proportional updates.
منابع مشابه
Residual norm steepest descent based iterative algorithms for Sylvester tensor equations
Consider the following consistent Sylvester tensor equation[mathscr{X}times_1 A +mathscr{X}times_2 B+mathscr{X}times_3 C=mathscr{D},]where the matrices $A,B, C$ and the tensor $mathscr{D}$ are given and $mathscr{X}$ is the unknown tensor. The current paper concerns with examining a simple and neat framework for accelerating the speed of convergence of the gradient-based iterative algorithm and ...
متن کاملAsymptotic and finite-sample properties of estimators based on stochastic gradients∗
Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly def...
متن کاملStochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).
متن کاملRandom Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints
Consider convex optimization problems subject to a large number of constraints. We focus on stochastic problems in which the objective takes the form of expected values and the feasible set is the intersection of a large number of convex sets. We propose a class of algorithms that perform both stochastic gradient descent and random feasibility updates simultaneously. At every iteration, the alg...
متن کاملNon-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
In this paper, we consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a stochastic approximation problem in the operations research communit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.03137 شماره
صفحات -
تاریخ انتشار 2018